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In this paper, we present an open-source 
parsing environment (Tubingen Linguistic 
Parsing Architecture, TuLiPA) which uses 
Range Concatenation Grammar (RCG) 
as a pivot formalism, thus opening the 
way to the parsing of several mildly 
context-sensitive formalisms. This en- 
vironment currently supports tree-based 
grammars (namely Tree-Adjoining Gram- 
mars (TAG) and Multi-Component Tree- 
Adjoining Grammars with Tree Tuples 
(TT-MCTAG)) and allows computation not 
only of syntactic structures, but also of the 
corresponding semantic representations. It 
is used for the development of a tree-based 
grammar for German. 

1 Introduction 

Grammars and lexicons represent important lin- 
guistic resources for many NLP applications, 
among which one may cite dialog systems, auto- 
matic summarization or machine translation. De- 
veloping such resources is known to be a complex 
task that needs useful tools such as parsers and 



generators ( |Erbach, I992| l. 

Furthermore, there is a lack of a common frame- 
work allowing for multi-formalism grammar engi- 
neering. Thus, many formalisms have been pro- 
posed to model natural language, each coming 
with specific implementations. Having a com- 
mon framework would facilitate the comparison 



between formalisms (e.g., in terms of parsing com- 
plexity in practice), and would allow for a better 
sharing of resources (e.g., having a common lex- 
icon, from which different features would be ex- 
tracted depending on the target formalism). 

In this context, we present a parsing environ- 
ment relying on a general aichitecture that can 
be used for parsing with mildly context-sensitive 
(MCS) formalism^ ( jJoshi, 1987D . Its underly- 
ing idea is to use Range Concatenation Grammar 
(RCG) as a pivot formalism, for RCG has been 
shown to strictly include MCS languages while be- 



ing parsable in polynomial time dBoullier, 2000[ ). 

Currently, this architecture supports tree-based 
grammars (Tree-Adjoining Grammars and Multi- 
Component Tree- Adjoining Grammars with Tree 
Tuples ( jLichte, 2007| )). More precisely, tree- 
based grammars are first converted into equivalent 
RCGs, which are then used for parsing. The result 
of RCG parsing is finally interpreted to extract a 
derivation structure for the input grammar, as well 
as to perform additional processings (e.g., seman- 
tic calculus, extraction of dependency views). 

The paper is structured as follows. In section |2l 
we present the architecture of the TuLiPA parsing 
environment and show how the use of RCG as a 
pivot formalism makes it easier to design a modu- 
lar system that can be extended to support several 
dimensions (syntax, semantics) and/or formalisms. 
In section [3l we give some desiderata for gram- 
mar engineering and present TuLiPAs current state 
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A formalism is said to be mildly context sensitive (MCS) 
iff (i) it generates limited cross-serial dependencies, (ii) it is 
polynomially parsable, and (iii) the string languages gener- 
ated by the formalism have the constant growth property (e.g., 
{a^ \n > 0} does not have this property). Examples of MCS 
formalisms include Tree-Adjoining Grammars, Combinatory 
Categorial Grammars and Linear Indexed Grammars. 



with respect to these. In section |4l we compare 
this system with existing approaches for parsing 
and more generally for grammar engineering. Fi- 
nally, in section |5l we conclude by presenting fu- 
ture work. 

2 Range Concatenation Grammar as a 
pivot formalism 

The main idea underlying TuLiPA is to use RCG 
as a pivot formalism for RCG has appealing for- 
mal properties (e.g., a generative capacity ly- 
ing beyond Linear Context Free Rewriting Sys- 
tems and a polynomial parsing complexity) and 
there exist efficient algorithms, for RCG parsing 




( |Boullier, 2000| ) and for grammar transformation 



into RCG (Boullier, 1998 Boulher, 1999 1. 



Parsing with TuLiPA is thus a 3-step process: 

1. The input tree-based grammar is converted 
into an RCG (using the algorithm of 
Kallmeyer and Parmentier (12008 l l when deal- 
ing with TT-MCTAG). 

2. The resulting RCG is used for parsing the in- 
put string using an extension of the parsing 
algorithm of Boulher (ilOOOl ). 

3. The RCG derivation structure is interpreted to 
extract the derivation and derived trees with 
respect to the input grammar. 

The use of RCG as a pivot formalism, and thus 
of an RCG parser as a core component of the sys- 
tem, leads to a modular architecture. In turns, this 
makes TuLiPA more easily extensible, either in 
terms of functionalities, or in terms of formalisms. 

2.1 Adding functionalities to the parsing 
environment 

As an illustration of TuLiPA' s extensibility, one 
may consider two extensions applied to the system 
recently. 

First, a semantic calculus using the syn- 
tax/semantics interface for TAG proposed by Gar- 
dent and Kallmeyer (120031 ) has been added. This 
interface associates each tree with flat semantic 
formulas. The arguments of these formulas are 
unification variables, which are co-indexed with 
features labelling the nodes of the syntactic tree. 
During classical TAG derivation, trees are com- 
bined, triggering unifications of the feature struc- 
tures labelling nodes. As a result of these unifica- 
tions, the arguments of the semantic formulas are 
unified (see Fig.[Tl). 



VP 

NP, / V NPi«^-^.__NP™^ 
John loves Mary 

name(j,john) love(x,y) name(m,mary) 

^•^ love(j, m), name(j,john), name(m, mary) 

Figure 1: Semantic calculus in Feature-Based 
TAG. 

In our system, the semantic support has been in- 
tegrated by (i) extending the internal tree objects to 
include semantic formulas (the RCG-conversion is 
kept unchanged), and (ii) extending the construc- 
tion of the derived tree (step 3) so that during the 
interpretation of the RCG derivation in terms of 
tree combinations, the semantic formulas are car- 
ried and updated with respect to the feature unifi- 
cations performed. 

Secondly, let us consider lexical disambigua- 
tion. Because of the high redundancy lying within 
lexicalized formalisms such as lexicalized TAG, 
it is common to consider tree schemata having a 
frontier node marked for anchoring (i.e., lexical- 
ization). At parsing time, the tree schemata are 
anchored according to the input string. This an- 
choring selects a subgrammar supposed to cover 
the input string. Unfortunately, this subgrammar 
may contain many trees that either do not lead to 
a parse or for which we know a priori that they 
cannot be combined within the same derivation 
(so we should not predict a derivation from one 
of these trees to another during parsing). As a re- 
sult, the parser could have poor performance be- 
cause of the many derivation paths that have to 
be explored. Bonfante et al. (120041) proposed to 
polarize the structures of the grammar, and to ap- 
ply an automaton-based filtering of the compatible 
structures. The idea is the following. One compute 
polarities representing the needs/resources brought 
by a given tree (or tree tuple for TT-MCTAG). 
A substitution or foot node with category NP re- 
flects a need for an NP (written NP-). In the same 
way, an NP root node reflects a resource of type 
NP (written NP-i-). Then you build an automaton 
whose edges correspond to trees, and states to po- 
larities brought by trees along the path. The au- 
tomaton is then traversed to extract all paths lead- 
ing to a final state with a neutral polarity for each 
category and -i-l for the axiom (see Fig.|2j the state 



7 is the only valid state and {proper., trans., det., 
noun.} the only compatible set of trees). 

John 1 1 eats 2 2 a 3 3 cake 4 




7^de^2pm^2j_ 



S+ S+ NP+ 



S+ NP- S+ NP- S+ 

Figure 2: Polarity-based lexical disambiguation. 

In our context, this polarity filtering has been 
added before step 1, leaving untouched the core 
RCG conversion and parsing steps. The idea is 
to compute the sets of compatible trees (or tree 
tuples for TT-MCTAG) and to convert these sets 
separately. Indeed the RCG has to encode only 
valid adjunctions/substitutions. Thanks to this 
automaton-based "clustering" of the compatible 
tree (or tree tuples), we avoid predicting incompat- 
ible derivations. Note that the time saved by using 
a polarity-based filter is not negligible, especially 
when parsing long sentencesHI 

2.2 Adding formalisms to the parsing 
environment 

Of course, the two extensions introduced in the 
previous section may have been added to other 
modular architectures as well. The main gain 
brought by RCG is the possibility to parse not 
only tree-based grammars, but other formalisms 
provided they can be encoded into RCG. In our 
system, only TAG and TT-MCTAG have been 
considered so far. Nonetheless, BouUier (119981 ) 
and S0gaard (2007) have defined transformations 
into RCG for other mildly context-sensitive for- 
malismsO 

To sum up, the idea would be to keep the core 
RCG parser, and to extend TuLiR4 with a specific 
conversion module for each targeted formalism. 
On top of these conversion modules, one should 
also provide interpretation modules allowing to de- 
code the RCG derivation forest in terms of the in- 
put formalism (see Fig. [3]). 



An evaluation of the gain brought by this technique when 
using Interaction Grammar is given by Bonfante et al. ( 2004 1. 
^ These include Multi-Component Tree- Adjoining Gram- 
mar, Linear Indexed Grammar, Head Grammar, Coupled 
Context Free Grammar, Right Linear Unification Grammar 
and Synchronous Unification Grammar. 




Figure 3: Towards a multi-formalism parsing envi- 
ronment. 

An important point remains to be discussed. It 
concerns the role of lexicalization with respect to 
the formalism used. Indeed, the tree-based gram- 
mar formalisms currently supported (TAG and TT- 
MCTAG) both share the same lexicalization pro- 
cess (i.e., tree anchoring). Thus the lexicon format 
is common to these formalisms. As we will see 
below, it corresponds to a 2-layer lexicon made of 
inflected forms and lemma respectively, the latter 
selecting specific grammatical structures. When 
parsing other formalisms, it is still unclear whether 
one can use the same lexicon format, and if not 
what kind of general lexicon management module 
should be added to the parser (in particular to deal 
with morphology). 

3 Towards a complete grammar 
engineering environment 

So far, we have seen how to use a generic parsing 
architecture relying on RCG to parse different for- 
malisms. In this section, we adopt a broader view 
and enumerate some requirements for a linguistic 
resource development environment. We also see 
to what extent these requirements are fulfilled (or 
partially fulfilled) within the TuLiPA system. 

3.1 Grammar engineering with TuLiPA 

As advocated by Erbach (119921 ). grammar en- 
gineering needs "tools for testing the grammar 
with respect to consistency, coverage, overgener- 
ation and accuracy" . These characteristics may 
be taken into account by different interacting soft- 
ware. Thus, consistency can be checked by a semi- 
automatic grammar production device, such as the 
XMG system of Duchier et al. (12004b . Overgen- 
eration is mainly checked by a generator (or by 
a parser with adequate test suites), and coverage 
and accuracy by a parser. In our case, the TuLiPA 
system provides an entry point for using a gram- 
mar production system (and a lexicon conversion 



tool introduced below), while including a parser. 
Note that TuLiPA does not include any generator, 
nonetheless it uses the same lexicon format as the 
GenI surface realizer for TAGq 

TuLiPA's input grammar is designed using 
XMG, which is a metagrammar compiler for tree- 
based formalisms. In other terms, the linguist de- 
fines a factorized description of the grammar (the 
so-called metagrammar) in the XMG language. 
Briefly, an XMG metagrammar consists of (i) ele- 
mentary tree fragments represented as tree descrip- 
tion logic formulas, and (ii) conjunctive and dis- 
junctive combinations of these tree fragments to 
describe actual TAG tree schematajj This meta- 
grammar is then compiled by the XMG system to 
produce a tree grammar in an XML format. Note 
that the resulting grammar contains tree schemata 
(i.e., unlexicalized trees). To lexicalize these, the 
linguist defines a lexicon mapping words with cor- 
responding sets of trees. Following XTAG (120011) . 
this lexicon is a 2-layer lexicon made of morpho- 
logical and lemma specifications. The motivation 
of this 2-layer format is (i) to express linguistic 
generalizations at the lexicon level, and (ii) to al- 
low the parser to only select a subgrammar accord- 
ing to a given sentence, thus reducing parsing com- 
plexity. TuLiPA comes with a lexicon conversion 
tool (namely lexConverter) allowing to write a lex- 
icon in a user-friendly text format and to convert it 
into XML. An example of an entry of such a lexi- 
con is given in Fig. |4l 

The morphological specification consists of a 
word, the corresponding lemma and morphologi- 
cal features. The main pieces of information con- 
tained in the lemma specification are the *ENTRY 
field, which refers to the lemma, the *CAT field 
referring to the syntactic category of the anchor 
node, the *SEM field containing some semantic in- 
formation allowing for semantic instantiation, the 
*FAM field, which contains the name of the tree 
family to be anchored, the ^FILTERS field which 
consists of a feature structure constraining by uni- 
fication the trees of a given family that can be 
anchored by the given lemma (used for instance 
for non-passivable verbs), the *EQUATIONS field 
allowing for the definition of equations targeting 
named nodes of the trees, and the *COANCHORS 
field, which allows for the specification of co- 
anchors (such as by in the verb to come by). 



Morphological specification: 



vergisst vergessen [pos=v,num=sg,per=3] 



Lemma specification: 



*ENTRY: vergessen 

*CAT: V 

*SEM: BinaryRel[pred=vergessen] 

*ACC: 1 

*FAM: Vnp2 

^FILTERS: [] 

*EX: 

*EQUATIONS: 

NPargl -^ cas = nom 

NParg2 -^ cas = ace 

*COANCHORS: 



Figure 4: Morphological and lemma specification 
of vergisst. 

From these XML resources, TuLiPA parses a 
string, corresponding either to a sentence or a con- 
stituent (noun phrase, prepositional phrase, etc.), 
and computes several output pieces of informa- 
tion, namely (for TAG and TT-MCTAG): deriva- 
tion/derived trees, semantic representations (com- 
puted from underspecified representations using 
the utool softwarqj, or dependency views of the 
derivation trees (using the DTool softwaro. 

3.2 Grammar debugging 

The engineering process introduced in the preced- 
ing section belongs to a development cycle, where 
one first designs a grammar and corresponding 
lexicons using XMG, then checks these with the 
parser, fixes them, parses again, and so on. 

To facilitate grammar debugging, TuLiPA in- 
cludes both a verbose and a robust mode allow- 
ing respectively to (i) produce a log of the RCG- 
conversion, RCG-parsing and RCG-derivation in- 
terpretation, and (ii) display mismatching features 
leading to incomplete derivations. More precisely, 
in robust mode, the parser displays derivations step 
by step, highlighting feature unification failures. 

TuLiPA's options can be activated via an intu- 
itive Graphical User Interface (see Fig.[5ll. 

3.3 Towards a functional common interface 

Unfortunately, as mentioned above, the linguist 
has to move back-and-forth from the gram- 



ihttp: //trac.loria. f r/~geni| 
^See (Crabbe, 20051 for a presentation on how to use the 
XMG formalism for describing a core TAG for French. 



Seehttp : //www . coli . uni-saarland. de/pro jects/ chorus /uto 
with courtesy of Alexander Roller. 
'with courtesy of Marco Kuhlmann. 
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highlighting, etc. Using the Echpse open-source 
development platform allows for reusing several 
components inherited from the software develop- 
ment community, such as plug-ins for version con- 
trol, editors coupled with explorers, etc. 

Eventually, one point worth considering in the 
context of grammar development concerns data en- 
coding. To our knowledge, only few environments 
provide support for UTF-8 encoding, thus guaran- 
tying the coverage of a wide set of charsets and 
languages. In TuLiPA, we added an UTF-8 sup- 
port (in the lexConverter), thus allowing to design 
a TAG for Korean (work in progress). 



Figure 5: TuLiPA's Graphical User Interface. 



3.4 Usability of the TuLiPA system 



mar/lexicon descriptions to the parser, i.e., each 
time the parser reports grammar errors, the linguist 
fixes these and then recomputes the XML files and 
then parses again. To avoid this tedious task of re- 
sources re-compilation, we started developing an 
Eclipsq^ plug-in for the TuLiPA system. Thus, the 
linguist will be able to manage all these resources, 
and to call the parser, the metagrammar compiler, 
and the lexConverter from a common interface (see 
Fig.©. 
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Figure 6: TuLiPA's eclipse plug-in. 

The motivation for this plug-in comes from 
the observation that designing electronic gram- 
mars is a task comparable to designing source 
code. A powerful grammar engineering environ- 
ment should thus come with development facili- 
ties such as precise debugging information, syntax 



As mentioned above, the TuLiPA system is made 
of several interacting components, that one cur- 
rently has to install separately. Nonetheless, much 
attention has been paid to make this installation 
process as easy as possible and compatible with 
all major platformsO 

XMG and lexConverter can be installed by com- 
piling their sources (using a make command). 
TuLiPA is developed in Java and released as an ex- 
ecutable jar. No compilation is needed for it, the 
only requirement is the Gecode/GecodeJ librar\l^^l 
(available as a binary package for many platforms). 
Finally, the TuLiPA eclipse plug-in can be installed 
easily from eclipse itself. All these tools are re- 
leased under Free software licenses (either GNU 
GPL or Eclipse Public License). 

This environment is being used (i) at the Uni- 
versity of Tubingen, in the context of the develop- 
ment of a TT-MCTAG for German describing both 
syntax and semantics, and (ii) at LORIA Nancy, 
in the development of an XTAG-based metagram- 
mar for English. The German grammar, called 
GerTT (for German Tree Tuples), is released un- 
der a LGPL license for Linguistic Resource^ and 



is presented in (Kallmeyer et al., 2008 1. The test- 
suite currently used to check the grammar is hand- 
crafted. A more systematic evaluation of the gram- 
mar is in preparation, using the Test Suite for Nat- 



ural Language Processing ( |Lehmann et al, 1996| ). 



See |http : //www . eclipse ■ org) 



"See 
"See 
'See 



http:/ /sourcesup.cru.f r/tulipal 
http: //www, gecode . org/g e'code j| 



http : //inf olingu . univ-mlv. f r/DonneesLinguistiques/L 



4 Comparison with existing approaches 

4.1 Engineering environments for tree-based 
grammar formalisms 

To our knowledge, there is currently no available 
parsing environment for multi-component TAG. 

Existing grammar engineering environments for 
TAG include the DyALog systeno described in 
Villemonte de la Clergerie (12005 l l. DyALog is a 
compiler for a logic programming language using 
tabulation and dynamic programming techniques. 
This compiler has been used to implement efficient 
parsing algorithms for several formalisms, includ- 
ing TAG and RCG. Unfortunately, it does not in- 
clude any built-in GUI and requires a good know- 
ledge of the GNU build tools to compile parsers. 
This makes it relatively difficult to use. DyALog 's 
main quality lies in its efficiency in terms of pars- 
ing time and its capacity to handle very large re- 
sources. Unlike TuLiPA, it does not compute se- 
mantic representations. 

The closest approach to TuLiPA corresponds to 
the SemTAG systen j'^l which extends TAG parsers 
compiled with DyALog with a semantic calcu- 
lus module ( |Gardent and Parmentier, 2007| ). Un- 
like TuLiPA, this system only supports TAG, and 
does not provide any graphical output allowing to 
easily check the result of parsing. 

Note that, for grammar designers mainly inter- 
ested in TAG, SemTAG and TuLiPA can be seen 
as complementary tools. Indeed, one may use 
TuLiPA to develop the grammar and check spe- 
cific syntactic structures thanks to its intuitive pars- 
ing environment. Once the grammar is stable, one 
may use SemTAG in batch processing to parse 
corpuses and build semantic representations using 
large grammars. This combination of these 2 sys- 
tems is made easier by the fact that both use the 
same input formats (a metagrammar in the XMG 
language and a text-based lexicon). This approach 
is the one being adopted for the development of a 
French TAG equipped with semantics. 

For Interaction Grammar (Perrie r, 2000| ), there 
exists an engineering environment gathering the 
XMG metagrammar compiler and an eLEtrOstatic 
PARser (LEOPAR)o This environment is be- 
ing used to develop an Interaction Grammar for 
French. TuLiPA's lexical disambiguation module 
reuses techniques introduced by LEOPAR. Unlike 



TuLiPA, LEOPAR does not currently support se- 
mantic information. 

4.2 Engineering environments for other 
grammar formalisms 

For other formalisms, there exist state-of-the-art 
grammar engineering environments that have been 
used for many years to design large deep grammars 
for several languages. 

For Lexical Functional Grammar, one may cite 
the Xerox Linguistic Environment (XLE)llj For 
Head-driven Phrase Structure Grammar, the main 
available systems are the Linguistic Knowledge 
Base (LKbIiI and the TRALE systemJlZ] For 
Combinatory Categorial Grammar, one may cite 
the OpenCCG hbrarjEl and the C&C parser^ 

These environments have been used to develop 
broad-coverage resources equipped with semantics 
and include both a generator and a parser. Un- 
like TuLiPA, they represent advanced projects, that 
have been used for dialog and machine translation 
applications. They are mainly tailored for a spe- 
cific formalismo 

5 Future work 

In this section, we give some prospective views 
concerning engineering environments in general, 
and TuLiPA in particular. We first distinguish be- 
tween 2 main usages of grammar engineering en- 
vironments, namely a pedagogical usage and an 
application-oriented usage, and finally give some 
comments about multi-formalism. 

5.1 Pedagogical usage 

Developing grammars in a pedagogical context 
needs facilities allowing for inspection of the struc- 
tures of the grammar, step-by-step parsing (or gen- 
eration), along with an intuitive interface. The idea 
is to abstract away from technical aspects related to 
implementation (intermediate data structures, opti- 
mizations, etc.). 

The question whether to provide graphical or 
text-based editors can be discussed. As advo- 
cated by Baldridge et al. (12007 l l. a low-level text- 
based specification can offer more flexibility and 



^See 
'See 
''See 



http : //dyalog.gf orge . inria . f r| 



http://trac.loria. f r/~semconst| 



See jhttp : //www2 .pare ■ com/isl/groups/nltt/xle/| 

See [http : //wiki . delph-in . net/moin| 
' See |http : //milca . sf s . uni-tuebingen . de/A4/Course/trale/| 
' See ht tp: //openccg. sourc ef orge .net/] 
''See http ://svn.ask.it . usyd . edu . au/trac/candc/wiki| 
^"Nonetheless, Beavers J2002b encoded a CCG in the 



http : //www ■ loria ■ fr/equipes/calligramri^B^c^pc d escription Language. 



bring less frustration to the grammar designer, es- 
pecially when such a specification can be graph- 
ically interpreted. This is the approach chosen 
by XMG, where the grammar is defined via an 
(advanced or not) editor such as gedit or emacs. 
Within TuLiPA, we chose to go further by using 
the Eclipse platform. Currently, it allows for dis- 
playing a summary of the content of a metagram- 
mar or lexicon on a side panel, while editing these 
on a middle panel. These two panels are linked 
via a jump functionality. The next steps concern 
(i) the plugging of a graphical viewer to display 
the (meta)grammar structures independently from 
a given parse, and (ii) the extension of the eclipse 
plug-in so that one can easily consistently modify 
entries of the metagrammar or lexicon (especially 
when these are split over several files). 

5.2 Application-oriented usage 

When dealing with applications, one may demand 
more from the grammar engineering environment, 
especially in terms of efficiency and robustness 
(support for larger resources, partial parsing, etc.). 

Efficiency needs optimizations in the parsing 
engine making it possible to support grammars 
containing several thousands of structures. One 
interesting question concerns the compilation of a 
grammar either off-line or on-line. In DyALog's 
approach, the grammar is compiled off-line into 
a logical automaton encoding all possible deriva- 
tions. This off-line compilation can take some 
minutes with a TAG having 6000 trees, but the re- 
sulting pai^ser can parse sentences within a second. 

In TuLiPAs approach, the grammar is compiled 
into an RCG on-Une. While giving satisfactory re- 
sults on reduced resource^, it may lead to trou- 
bles when scaling up. This is especially true for 
TAG (the TT-MCTAG formahsm is by definition a 
factorized formalism compared with TAG). In the 
future, it would be useful to look for a way to pre- 
compile a TAG into an RCG off-line, thus saving 
the conversion time. 

Another important feature of grammar engineer- 
ing environments consists of its debugging func- 
tionalities. Among these, one may cite unit and 
integration testing. It would be useful to extend 
the TuLiPA system to provide a module for gen- 
erating test-suites for a given grammar. The idea 



would be to record the coverage and analyses of 
a grammar at a given time. Once the grammar is 
further developed, these snapshots would allow for 
regression testing. 

5.3 About multi-formalism 

We already mentioned that TuLiPA was opening 
a way towards multi-formalism by relying on an 
RCG core. It is worth noticing that the XMG 
system was also designed to be further extensi- 
ble. Indeed, a metagrammar in XMG corresponds 
to the combination of elementary structures. One 
may think of designing a library of such structures, 
these would be dependent on the target gram- 
mar formalism. The combinations may represent 
general linguistic concepts and would be shared 
by different grammar implementations, following 
ideas presented by Bender et al. (12005 1) . 

6 Conclusion 

In this paper, we have presented a multi-formalism 
parsing architecture using RCG as a pivot formal- 
ism to parse mildly context-sensitive formalisms 
(currently TAG and TT-MCTAG). This system has 
been designed to facilitate grammar development 
by providing user-friendly interfaces, along with 
several functionalities (e.g., dependency extrac- 
tion, derivation/derived tree display and semantic 
calculus). It is currently used for developing a core 
grammar for German. 

At the moment, we are working on the extension 
of this architecture to include a fully functional 
Eclipse plug-in. Other current tasks concern op- 
timizations to support large scale parsing and the 
extension of the syntactic and semantic coverage 
of the German grammar under development. 

In a near future, we plan to evaluate the parser 
and the German grammar (parsing time, correc- 
tion of syntactic and semantic outputs) with re- 
spect to a standard test-suite such as the TSNLP 
( [Lehmann et al, 1996l l. 



For a TT-MCTAG counting about 300 sets of trees and an 
and-crafted lexicon made of about 300 of words, a 10-word 
sentence is parsed (and a semantic representation computed) 
within seconds. 
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Ul lnterface,java [Tj PolarityAutomaton.ja [7] TreeSelector.ja 



% Tree fragments 
%================ 

class Determiner 

export ?D ?CASE ?NUMB ?GEND 

declare ?D ?CASE ?NUMB ?GEND 

{ 

<syn>{ 

node ?[.' (mark = anchor) 

[cat = det, cas = ?CASE, num = ?NLIMB, 
} 
} 



■6 

% Tree/Tuple builders 



class NounDeterminer 

declare FNounPhrPro] ?DetNode 

{ 

?NounPhrProj = NounPhraseProjection[] ; 

?DetNode = Determiner [] ; 

?NounPhrProj . ?DEr_F = - ; 

?5?NounPhrProj .?DET_T = + ; 

?NounPhrProj .?DEr_B = + ; 

?DetNode.?CASE = ?NounPhrProj . ?CASEj 

?DetNode.?NUMB = ?NounPhrProj . ?NUMB; 

?DetNode.?GEND = ?NounPhrProj . ?GEND; 

oyn^-^ 

?DetNode.?D » ?NounPhrProj . ?NPf 

} 
} 
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No consoles to display at this time. 
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